Goto

Collaborating Authors

 transfer attack



AppendixofSynergy-of-experts 1 TheoreticalProofs

Neural Information Processing Systems

From Figure 1(a), learning multiple linear sub-models and averaging the predictions (ensemble) is still a linear model, so it cannot tackleXOR problem. We compare the training cost of all methods from the two aspects;1). Thesub-model training enables themost adversarial attacks ofsub-models could be successfully defended. In particular, we train two kinds of models to defend against the attacks: 1). FromFigure2(a)and2(b),when0.01 ϵ 0.04, SoE without the collaboration training achieves a similar robustness compared with SoE.



AdversariallyRobust3DPointCloudRecognition UsingSelf-Supervisions SupplementaryMaterials

Neural Information Processing Systems

In this section, we introduce our implementation details of the adopted model architectures and self-supervisedlearningtasks. The EdgeConv layers are stacked to form the DGCNN backbone. As introduced in 2.2,we choose k = 3,4 in this task. We follow the attack setups in [13] to formulate our attack. Weprovide insights onhow different components contribute to the overall improvements.


3ad7c2ebb96fcba7cda0cf54a2e802f5-Paper.pdf

Neural Information Processing Systems

Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, andsuffersfrom significant lossonclean dataaccuracy.


Blurred-Dilated Method for Adversarial Attacks

Neural Information Processing Systems

Deep neural networks (DNNs) are vulnerable to adversarial attacks, which lead to incorrect predictions. In black-box settings, transfer attacks can be conveniently used to generate adversarial examples. However, such examples tend to overfit the specific architecture and feature representations of the source model, resulting in poor attack performance against other target models. To overcome this drawback, we propose a novel model modification-based transfer attack: Blurred-Dilated method (BD) in this paper. In summary, BD works by reducing downsampling while introducing BlurPool and dilated convolutions in the source model.


DVERGE: Diversifying Vulnerabilities for Enhanced Robust Generation of Ensembles

Neural Information Processing Systems

Recent research finds CNN models for image classification demonstrate overlapped adversarial vulnerabilities: adversarial attacks can mislead CNN models with small perturbations, which can effectively transfer between different models trained on the same dataset. Adversarial training, as a general robustness improvement technique, eliminates the vulnerability in a single model by forcing it to learn robust features. The process is hard, often requires models with large capacity, and suffers from significant loss on clean data accuracy. Alternatively, ensemble methods are proposed to induce sub-models with diverse outputs against a transfer adversarial example, making the ensemble robust against transfer attacks even if each sub-model is individually non-robust. Only small clean accuracy drop is observed in the process. However, previous ensemble training methods are not efficacious in inducing such diversity and thus ineffective on reaching robust ensemble. We propose DVERGE, which isolates the adversarial vulnerability in each sub-model by distilling non-robust features, and diversifies the adversarial vulnerability to induce diverse outputs against a transfer attack. The novel diversity metric and training procedure enables DVERGE to achieve higher robustness against transfer attacks comparing to previous ensemble methods, and enables the improved robustness when more sub-models are added to the ensemble.


Text Prompt Injection of Vision Language Models

Zhu, Ruizhe

arXiv.org Artificial Intelligence

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.



Evaluating the Robustness of a Production Malware Detection System to Transferable Adversarial Attacks

Nasr, Milad, Fratantonio, Yanick, Invernizzi, Luca, Albertini, Ange, Farah, Loua, Petit-Bianco, Alex, Terzis, Andreas, Thomas, Kurt, Bursztein, Elie, Carlini, Nicholas

arXiv.org Artificial Intelligence

As deep learning models become widely deployed as components within larger production systems, their individual shortcomings can create system-level vulnerabilities with real-world impact. This paper studies how adversarial attacks targeting an ML component can degrade or bypass an entire production-grade malware detection system, performing a case study analysis of Gmail's pipeline where file-type identification relies on a ML model. The malware detection pipeline in use by Gmail contains a machine learning model that routes each potential malware sample to a specialized malware classifier to improve accuracy and performance. This model, called Magika, has been open sourced. By designing adversarial examples that fool Magika, we can cause the production malware service to incorrectly route malware to an unsuitable malware detector thereby increasing our chance of evading detection. Specifically, by changing just 13 bytes of a malware sample, we can successfully evade Magika in 90% of cases and thereby allow us to send malware files over Gmail. We then turn our attention to defenses, and develop an approach to mitigate the severity of these types of attacks. For our defended production model, a highly resourced adversary requires 50 bytes to achieve just a 20% attack success rate. We implement this defense, and, thanks to a collaboration with Google engineers, it has already been deployed in production for the Gmail classifier.